48 research outputs found

    Privacy Preservation by Disassociation

    Full text link
    In this work, we focus on protection against identity disclosure in the publication of sparse multidimensional data. Existing multidimensional anonymization techniquesa) protect the privacy of users either by altering the set of quasi-identifiers of the original data (e.g., by generalization or suppression) or by adding noise (e.g., using differential privacy) and/or (b) assume a clear distinction between sensitive and non-sensitive information and sever the possible linkage. In many real world applications the above techniques are not applicable. For instance, consider web search query logs. Suppressing or generalizing anonymization methods would remove the most valuable information in the dataset: the original query terms. Additionally, web search query logs contain millions of query terms which cannot be categorized as sensitive or non-sensitive since a term may be sensitive for a user and non-sensitive for another. Motivated by this observation, we propose an anonymization technique termed disassociation that preserves the original terms but hides the fact that two or more different terms appear in the same record. We protect the users' privacy by disassociating record terms that participate in identifying combinations. This way the adversary cannot associate with high probability a record with a rare combination of terms. To the best of our knowledge, our proposal is the first to employ such a technique to provide protection against identity disclosure. We propose an anonymization algorithm based on our approach and evaluate its performance on real and synthetic datasets, comparing it against other state-of-the-art methods based on generalization and differential privacy.Comment: VLDB201

    Content Recommendation for Viral Social Influence

    Get PDF

    Parallel In-Memory Evaluation of Spatial Joins

    Full text link
    The spatial join is a popular operation in spatial database systems and its evaluation is a well-studied problem. As main memories become bigger and faster and commodity hardware supports parallel processing, there is a need to revamp classic join algorithms which have been designed for I/O-bound processing. In view of this, we study the in-memory and parallel evaluation of spatial joins, by re-designing a classic partitioning-based algorithm to consider alternative approaches for space partitioning. Our study shows that, compared to a straightforward implementation of the algorithm, our tuning can improve performance significantly. We also show how to select appropriate partitioning parameters based on data statistics, in order to tune the algorithm for the given join inputs. Our parallel implementation scales gracefully with the number of threads reducing the cost of the join to at most one second even for join inputs with tens of millions of rectangles.Comment: Extended version of the SIGSPATIAL'19 paper under the same titl

    Two-layer Space-oriented Partitioning for Non-point Data

    Full text link
    Non-point spatial objects (e.g., polygons, linestrings, etc.) are ubiquitous. We study the problem of indexing non-point objects in memory for range queries and spatial intersection joins. We propose a secondary partitioning technique for space-oriented partitioning indices (e.g., grids), which improves their performance significantly, by avoiding the generation and elimination of duplicate results. Our approach is easy to implement and can be used by any space-partitioning index to significantly reduce the cost of range queries and intersection joins. In addition, the secondary partitions can be processed independently, which makes our method appropriate for distributed and parallel indexing. Experiments on real datasets confirm the advantage of our approach against alternative duplicate elimination techniques and data-oriented state-of-the-art spatial indices. We also show that our partitioning technique, paired with optimized partition-to-partition join algorithms, typically reduces the cost of spatial joins by around 50%.Comment: To appear in the IEEE Transactions on Knowledge and Data Engineerin

    On-Line Discovery of Hot Motion Paths

    Get PDF
    We consider an environment of numerous moving objects, equipped with location-sensing devices and capable of communicating with a central coordinator. In this setting, we investigate the problem of maintaining hot motion paths, i.e., routes frequently followed by multiple objects over the recent past. Motion paths approximate portions of objects' movement within a tolerance margin that depends on the uncertainty inherent in positional measurements. Discovery of hot motion paths is important to applications requiring classification/profiling based on monitored movement patterns, such as targeted advertising, resource allocation, etc. To achieve this goal, we delegate part of the path extraction process to objects, by assigning to them adaptive lightweight filters that dynamically suppress unnecessary location updates and, thus, help reducing the communication overhead. We demonstrate the benefits of our methods and their efficiency through extensive experiments on synthetic data sets

    Privacy-Preserving Release of Spatio-temporal Density

    Get PDF
    International audienceIn today’s digital society, increasing amounts of contextually rich spatio-temporal information are collected and used, e.g., for knowledge-based decision making, research purposes, optimizing operational phases of city management, planning infrastructure networks, or developing timetables for public transportation with an increasingly autonomous vehicle fleet. At the same time, however, publishing or sharing spatio-temporal data, even in aggregated form, is not always viable owing to the danger of violating individuals’ privacy, along with the related legal and ethical repercussions. In this chapter, we review some fundamental approaches for anonymizing and releasing spatio-temporal density, i.e., the number of individuals visiting a given set of locations as a function of time. These approaches follow different privacy models providing different privacy guarantees as well as accuracy of the released anonymized data. We demonstrate some sanitization (anonymization) techniques with provable privacy guarantees by releasing the spatio-temporal density of Paris, in France. We conclude that, in order to achieve meaningful accuracy, the sanitization process has to be carefully customized to the application and public characteristics of the spatio-temporal data

    Privacy Preservation in the Dissemination of Location Data

    No full text
    The rapid advance in handheld communication devices and the appearance of smartphones has allowed users to connect to the Internet and surf on the WWW while they are moving around the city or traveling. Location based services have been developed to deliver content that is adjusted to the current user location. Social networks have also responded to the challenge of users who can access the Internet from any place in the city, and location based social-networks like Foursquare have become very popular in a short period of time. The popularity of these applications is linked to the significant advantages they offer: users can exploit live location-based information to take dynamic decisions on issues like transportation, identification of places of interest or even on the opportunity to meet a friend or an associate in nearby locations. A side effect of sharing location-based information is that it exposes the user to substantial privacy related threats. Revealing the user’s location carelessly can prove to be embarrassing, harmful professionally, or even dangerous. Research in the data management field has put significant effort on anonymization techniques that obfuscate spatial information in order to hide the identity of the user or her exact location. Privacy guaranties and anonymization algorithms become increasingly sophisticated offering better and more efficient protection in data publishing and data exchange. Still, it is not clear yet what are the greatest dangers to user privacy and which are the most realistic privacy breaching scenarios. The aim of the paper is to provide a brief survey of the attack scenarios, the privacy guaranties and the data transformations employed to protect user privacy in real time. The paper focuses mostly on providing an overview of the privacy models that are investigated in literature and less on the algorithms and their scaling capabilities. The models and the attack scenarios are classified and compared, in order to provide an overview of the cases that are covered by existing research. 1
    corecore